CCBR-620

Maggie Cam

Oct 15, 2015

Background

To detect different gene expression in a small subpopulation (<1%) of skin, call Merkel cell, upon shh, fgf20, and edar mutation. Aim to analyze the connection between the three different pathways, and try to get clue of the genes participating in Merkel cell development.

E15.5 embryo skin from 2 mutant lines:

edar mutant = mutation in gene ectodysplasin-A receptor, Single point mutation (Mutation details: This allele involves a G to A transition mutation at nucleotide 1,135 that causes the amino acid change: glutamate to lysine at position 379 (E379K). (J:56496))
phenotype (http://www.informatics.jax.org/allele/allgenoviews/MGI:1856018) = abnormal coat/hair morphology, darkened coat color

fgf20 mutant = fgf20-β-galactosidase knock-in allele phenotype= no guard hair in adult mouse back skin

Data Processing

## Bioconductor version 3.2 (BiocInstaller 1.20.0), ?biocLite for help

This part of the program was run on biowulf, and data transferred to a local directory

#Command line version
module load subread
x=$(ls *.bam)
featureCounts -p -T 8 -s 2 -p -t exon -g gene_id -a /data/maggiec/RNASeq/Genomes/mm10/gencode.vM4.all.gtf -o counts_ss.txt $x

#Used R version:
gtf="data/maggiec/RNASeq/Genomes/mm10/gencode.vM4.all.gtf"
targets <- readTargets()
fc <- featureCounts(files=targets$bam,isGTFAnnotationFile=TRUE,nthreads=32,
      annot.ext=gtf,GTF.attrType="gene_name",strandSpecific=2,isPairedEnd=TRUE)
x <- DGEList(counts=fc$counts, genes=fc$annotation)

Load data from local directory:

setwd("/Users/maggiec/GitHub/Maggie/ccbr620/")
#load("data/ssData.RData")
#fc=fc_ss
load("data/RNASeq_orig.RData")
ls()
## [1] "fc"      "gtf"     "targets"
#Change the Phenotype (switched)
targets$Phenotype[c(7,8,10)] <- "fgf20lz/lz mutant"
targets$Phenotype[c(9,11,12)] <- "fgf20lz/+ control"
targets
##               bam         Phenotype Batch
## 1  Sample_e10.bam    edar/+ control     1
## 2   Sample_e1.bam    edar/+ control     2
## 3   Sample_e2.bam    edar/+ control     2
## 4   Sample_e4.bam  edar/edar mutant     2
## 5   Sample_e5.bam  edar/edar mutant     2
## 6   Sample_e9.bam  edar/edar mutant     1
## 7   Sample_f1.bam fgf20lz/lz mutant     3
## 8   Sample_f2.bam fgf20lz/lz mutant     3
## 9   Sample_f4.bam fgf20lz/+ control     3
## 10  Sample_f5.bam fgf20lz/lz mutant     3
## 11  Sample_f6.bam fgf20lz/+ control     3
## 12  Sample_f7.bam fgf20lz/+ control     3
fc$stat
##                        Status Sample_e10.bam Sample_e1.bam Sample_e2.bam
## 1                    Assigned       30137063      38131187      38018334
## 2        Unassigned_Ambiguity         338418        423143        427446
## 3     Unassigned_MultiMapping        5540811       6724855       7144799
## 4       Unassigned_NoFeatures        2332980       2896013       3515618
## 5         Unassigned_Unmapped              0             0             0
## 6   Unassigned_MappingQuality              0             0             0
## 7  Unassigned_FragementLength              0             0             0
## 8          Unassigned_Chimera              0             0             0
## 9        Unassigned_Secondary              0             0             0
## 10     Unassigned_Nonjunction              0             0             0
## 11       Unassigned_Duplicate              0             0             0
##    Sample_e4.bam Sample_e5.bam Sample_e9.bam Sample_f1.bam Sample_f2.bam
## 1       39859361      37639209      21918421      29010380      31527017
## 2         464930        429850        251806        328766        361974
## 3        7824053       7377360       4328405       5049329       5805782
## 4        3938754       3294804       1846191       3223421       3212604
## 5              0             0             0             0             0
## 6              0             0             0             0             0
## 7              0             0             0             0             0
## 8              0             0             0             0             0
## 9              0             0             0             0             0
## 10             0             0             0             0             0
## 11             0             0             0             0             0
##    Sample_f4.bam Sample_f5.bam Sample_f6.bam Sample_f7.bam
## 1       20773860      30420667      32484170      31229089
## 2         226857        343845        360251        353216
## 3        3820429       5435442       5666179       5505908
## 4        1941022       3457653       3716923       3263560
## 5              0             0             0             0
## 6              0             0             0             0
## 7              0             0             0             0
## 8              0             0             0             0
## 9              0             0             0             0
## 10             0             0             0             0
## 11             0             0             0             0
fc1=mat=fc$counts
tfc1=t(fc1)
filter <- apply(fc1, 1, function(x) length(x[x>5])>=1)
fc1filt <- fc1[filter,]
genes <- rownames(fc1filt)

QC Check: Look at raw signal distribution and median expression levels

Data is normalized by TMM and quantile normalization

## [1] 16253    12

Run Voom

#Do analysis for entire group
celltype <- factor(targets$Phenotype)
Batch <- factor(targets$Batch)
cell_rep=paste(celltype,Batch,sep=".")
design <- model.matrix(~0+celltype)
design
##    celltypeedar/+ control celltypeedar/edar mutant
## 1                       1                        0
## 2                       1                        0
## 3                       1                        0
## 4                       0                        1
## 5                       0                        1
## 6                       0                        1
## 7                       0                        0
## 8                       0                        0
## 9                       0                        0
## 10                      0                        0
## 11                      0                        0
## 12                      0                        0
##    celltypefgf20lz/+ control celltypefgf20lz/lz mutant
## 1                          0                         0
## 2                          0                         0
## 3                          0                         0
## 4                          0                         0
## 5                          0                         0
## 6                          0                         0
## 7                          0                         1
## 8                          0                         1
## 9                          1                         0
## 10                         0                         1
## 11                         1                         0
## 12                         1                         0
## attr(,"assign")
## [1] 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$celltype
## [1] "contr.treatment"
dev.off()
## null device 
##           1
v <- voom(x,design,plot=TRUE,normalize="quantile")

After Normalization

## Using  as id variables

unnamed_chunk_6snapshot
You must enable Javascript to view this page properly.

Preprocessing of edar group (voom): 2 batches of pups (voom)

##              bam        Phenotype Batch
## 1 Sample_e10.bam   edar/+ control     1
## 2  Sample_e1.bam   edar/+ control     2
## 3  Sample_e2.bam   edar/+ control     2
## 4  Sample_e4.bam edar/edar mutant     2
## 5  Sample_e5.bam edar/edar mutant     2
## 6  Sample_e9.bam edar/edar mutant     1
##   (Intercept) Batch2 celltypeedar/edar mutant
## 1           1      0                        0
## 2           1      1                        0
## 3           1      1                        0
## 4           1      1                        1
## 5           1      1                        1
## 6           1      0                        1
## attr(,"assign")
## [1] 0 1 2
## attr(,"contrasts")
## attr(,"contrasts")$Batch
## [1] "contr.treatment"
## 
## attr(,"contrasts")$celltype
## [1] "contr.treatment"

unnamed_chunk_7snapshot
You must enable Javascript to view this page properly.

Preprocessing of Fgf group (voom): 1 batch of pups

##              bam         Phenotype Batch
## 7  Sample_f1.bam fgf20lz/lz mutant     3
## 8  Sample_f2.bam fgf20lz/lz mutant     3
## 9  Sample_f4.bam fgf20lz/+ control     3
## 10 Sample_f5.bam fgf20lz/lz mutant     3
## 11 Sample_f6.bam fgf20lz/+ control     3
## 12 Sample_f7.bam fgf20lz/+ control     3
##   (Intercept) celltypefgf20lz/lz mutant
## 1           1                         1
## 2           1                         1
## 3           1                         0
## 4           1                         1
## 5           1                         0
## 6           1                         0
## attr(,"assign")
## [1] 0 1
## attr(,"contrasts")
## attr(,"contrasts")$celltype
## [1] "contr.treatment"

unnamed_chunk_8snapshot
You must enable Javascript to view this page properly.

Genes of Interest

####Statistical Analysis of Edar group (lmfit)

#Run analysis of edar group:
fit1 <- lmFit(v1,design1) 
fit1 <- eBayes(fit1) 
options(digits=3) 
topgenes1=topTable(fit1,coef=3,n=50,sort.by="p")
FC = 2^(fit1$coefficients[,3])
FC = ifelse(FC<1,-1/FC,FC)
finalres=topTable(fit1,coef=3,sort="none",n=Inf)

Volcano Plot: Edar Group

Heatmap: Edar Group Before and after Batch Removal

Barchart: Edar Group Before and after Batch Removal

PCA Plot: After Batch Removal

unnamed_chunk_14snapshot
You must enable Javascript to view this page properly.

Statistical Analysis of Fgf group

design2 <- model.matrix(~celltype)
fit2 <- lmFit(v2,design2) 
fit2 <- eBayes(fit2) 
options(digits=3) 
topgenes2=topTable(fit2,coef=2,n=50,sort.by="p")
finalres2=topTable(fit2,coef=2,sort="none",n=Inf)

FC2 = 2^(fit2$coefficients[,2])
FC2 = ifelse(FC2<1,-1/FC2,FC2)

Volcano Plot: Fgf Group

Heatmap: Fgf Group

Boxplots: Fgf Group

Provenance:

## R version 3.2.1 (2015-06-18)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.4 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] d3heatmap_0.6.1      googleVis_0.5.10     reshape_0.8.5       
##  [4] knitr_1.11           rgl_0.95.1367        ggplot2_1.0.1       
##  [7] edgeR_3.12.0         limma_3.26.0         Rsubread_1.20.0     
## [10] BiocInstaller_1.20.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.1        magrittr_1.5       MASS_7.3-44       
##  [4] munsell_0.4.2      colorspace_1.2-6   stringr_1.0.0     
##  [7] plyr_1.8.3         tools_3.2.1        grid_3.2.1        
## [10] gtable_0.1.2       png_0.1-7          htmltools_0.2.6   
## [13] yaml_2.1.13        digest_0.6.8       RJSONIO_1.3-0     
## [16] RColorBrewer_1.1-2 reshape2_1.4.1     formatR_1.2.1     
## [19] htmlwidgets_0.5    base64enc_0.1-3    evaluate_0.8      
## [22] rmarkdown_0.8.1    labeling_0.3       stringi_0.5-5     
## [25] scales_0.3.0       jsonlite_0.9.17    proto_0.3-10